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Abstract. Assume that a finite set of points is randomly sampled 
from a subspace of a metric space. Recent advances in computa- 
tional topology have provided several approaches to recovering the 
geometric and topological properties of the underlying space. In 
this paper we take a statistical approach to this problem. We as- 
sume that the data is randomly sampled from an unknown proba- 
bility distribution. We define two filtered complexes with which we 
can calculate the persistent homology of a probability distribution. 
Using statistical estimators for samples from certain families of dis- 
tributions, we show that we can recover the persistent homology 
of the underlying distribution. 

1. Introduction 

There is growing interest in characterizing topological features of 
data sets. Given a finite set, sometimes called point cloud data (PCD), 
that is randomly sampled from a subspace X of some metric space, 
one hopes to recover geometric and topological properties of X. Using 
random samples, P. Niyogi, S. Smale and S. Weinberger [NSW05] show 
how to recover the homology of certain submanifolds. In [CCSL06] the 
homotopy-type of certain compact subsets is recovered. 

A finer descriptor, developed by H. Edelsbrunner, D. Letscher, A. 
Zomorodian and G. Carlsson, is that of persistent homology [ELZ02, 
ZC05]. While it is not a homotopy invariant, it is stable under small 
changes [CSEH05]. Using the PCD and the metric, one can construct 
a filtered simplicial complex which approximates the unknown space 
X [dSC04, CZCG04]. This leads naturally to a spectral sequence. 
What is unusual, is that the homology of the start of the spectral 
sequence is uninteresting, and so is what it converges to. Nevertheless, 
the intermediate homology, called persistent homology is of interest. 
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It can be described using barcodes, which are analogues of the Betti 
numbers. 

The aim of this paper is to take a statistical approach to these ideas. 
We assume that the data is sampled from a manifold with respect to 
a probability distribution. Given such a distribution, we construct two 
filtered chain complexes: the Morse complex, and the Cech complex. 
For most of the distributions we consider, these complexes are related 
by Alexander duality. Using persistent homology, one can calculate the 
corresponding Betti barcodes, which provide a topological description 
of the distribution. In the case of the Cech complex we define a Betti-0 
function. We apply to these methods to several parametric families of 
distributions: the von Mises, von Mises-Fisher, Watson and Bingham 
distributions on S' p_1 and the matrix von Mises distribution on 5*0(3). 

Given a sample, it is assumed that the underlying distribution is 
unknown, but that it is one of a parametrized family. We use statistical 
techniques to estimate the parameter. These are then used to estimate 
the barcodes. As a result, we prove that we can recover the persistent 
homology of the underlying distribution. 

Theorem 1.1. Let X\,...,x n be a sample from S^ 1 according to 
the von Mises-Fisher distribution with fixed concentration parameter 
k > 0. Given the sample, let k be the maximum likelihood estima- 
tor for k (which is given by formula (6.1)j. Let j3 K and (3% denote the 
Betti barcodes for the persistent homology of the densities associated 
with k and k using either the Morse or the Cech filtration. Finally 
let E(-) denote the expectation, and V denote the barcode metric (see 
Definition 3.5). Then, 

E(V((3 k ,f3 K ))<C(K)n- 1 / 2 , 

as n — > oo ; for some constant C(k). 

We also show that the classical theory of spacings [Pyk65] can be used 
to calculate the exact expectations of the Betti barcodes for samples 
from the uniform distribution on S* 1 together with their asymptotic 
behavior. 

As part of results, we show that the Morse nitrations of our distri- 
butions each correspond to a relative CW-structure for the underlying 
spaces. The von Mises and von Mises-Fisher distributions correspond 
to the decomposition S p ~ 1 ~ * U* Z) p_1 , the Watson distribution cor- 
responds to S 1 "- 1 sa S p - 2 Uidu-w (D^ 1 n Dp- 1 ), and the Bingham 
distribution corresponds to S^ 1 xs * Uidu-id (D 1 II D 1 ) Uidu-id (D 2 U 
D 2 )U. . .Uidu-idiDP^UD 1 '- 1 ). Finally, the Morse filtration on the ma- 
trix von Mises distribution on 5*0(3) corresponds to the decomposition 
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MP 2 U/ D 3 where / : S 2 — > MP 2 identifies antipodal points. Interest- 
ingly, the last decomposition is obtained by using the Hopf fibration 
S° -> S 3 -> MP 3 . 

A summary of the paper goes as follows. In Section 2, we go over 
the background and notation used in this paper. We review both the 
statistical and the topological terminologies. In Section 3 we discuss 
filtrations and persistent homology and we develop two nitrations for 
densities. In Section 4 we use the theory of spacings to give exact esti- 
mates of the persistent homology of uniform samples on S 1 . In Section 
5 we calculate the persistent homology of some standard parametric 
families of densities on S^ 1 and SO (3). In Section 6 we use maxi- 
mum likelihood estimators to recover the persistent homology of the 
underlying density. 

2. Background and notation 

In an attempt to make this article accessible to a broad audience, 
we define some of the basic statistical and topological terms we will be 
using. 

2.1. Statistics. Given a manifold M. with Radon measure v, a density 
is a function / : M. — > [0, oo] such that fdv is a probability distribution 
on M with J M fdu = 1. A common statistical example is to take 
M. = MJ 3 , and dv to be the p— dimensional Lebesgue measure. A density 
in this case would be a nonnegative function that integrates to unity. 
We can also take M = S^ -1 , the (p — l)-dimensional unit sphere, 
with dv being the (p — l)-dimensional spherical measure. In this case 
a density is referred to as a directional density. For Ai a compact 
connected orientable Riemannian manifold, dv would be the measure 
induced by the Riemannian structure. 

In statistics, we think of a family of probability densities parametrized 
accordingly 

(2.1) {/.:^9} , 

where i? is called a parameter and 6 is called the parameter space. 
The parameter space can be quite general and if it is some subset 
of a finite-dimensional vector space, then (2.1) is referred to as a para- 
metric family of densities, otherwise it is known as a nonparametric 
family of densities. Subsequent to this, the corresponding statistical 
problem will be referred to as either a parametric statistical procedure, 
or, a nonparametric statistical procedure, depending on whether we 
are dealing with a parametric, or nonparametric family of densities, 
respectively. 



4 



PETER BUBENIK AND PETER T. KIM 



Some parametric examples are in order. Let M. = MP and consider 
the normal family of location scale probability densities, 

(2-2) / M , CT 0r) = (27raV /2 exp{^} , 

where /i,x G M p and a 2 G [0, oo). Letting d = (fi,cr 2 ), we note that 
this parametric problem has = R p x [0, oo) as its parameter space. 

If we take M. = S^ -1 , a well known example of a directional density, 
and one that will be used in this paper is given by 

(2.3) U,k( x ) = C ( K ) ex P > 

where /i,x G S^ -1 , k G [0, oo), c(k) is the normalizing constant and 
superscript "t" denotes transpose. The distribution arising from f^ K is 
called the von Mises-Fisher distribution where this parametric problem 
has = S 11 ^ 1 x [0, oo) as its parameter space. 

Somewhat related to the above is the situation where M. = SO(p), 
the space of p x p rotation matrices. Let 

(2.4) UA X ) = C ( K ) ex P {wtrafy} , 

where /i, x G SO(p), k G [0, oo) and c(k) is the normalizing constant. 
The distribution arising from is called the matrix von Mises-Fisher 
distribution where this parametric problem has = SO(p) x [0, oo) as 
its parameter space. 

A sample X±, X 2 , . . . Xn is a sequence of independent and identically 
distributed random quantities on M. drawn according to the density 
f$ for some fixed but unknown d G 0. The parameter of interest 
would be the fixed but unknown parameter i?, or, more generally, some 
transformation r(t?) thereof. Statistically, we want to find an estima- 
tor f = f(X 1: . . . , X N ) of r(i?). Given some metric 7 on r(0), the 
performance of the estimator is evaluated relative to this metric in 
expectation with respect to the joint probability density of the sample, 

(2.5) E^{f,r)= ■■■ i(?,T)U---Udv---dv , 

J M JM 

where the above represents an iV— fold integration and d G 0. Thus the 
relative merit of one estimator over another estimator can be evaluated 
using (2.5) in a statistical decision theory context, see [Ber85]. 

There are a wide variety of different distributions for a given mani- 
fold, as well as sample spaces that are different manifolds. References 
that discuss these topics can be found in the books by Mardia and 
Jupp [MJ00] and Chikuse [Chi03]. Furthermore, although nonpara- 
metric statistical procedures on compact Riemannian manifolds are 
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available, [Hen90, EfrOO, AK05, KKOO], in this paper we will deal with 
parametric statistical procedures. 

2.2. Topology. Let R be a commutative ring with identity. (In fact, 
we will only be interested in cases where R is a field, in which case 
.R-modules are vector spaces and i?-module morphisms are linear maps 
of vector spaces.) 

Definition 2.1. A chain complex over R is a sequence of -R-modules 
{Ci}i e z together with i?-module morphisms d{ : C{ — > Cj_i called 
differentials such that diod i+ i = 0. This condition is often abbreviated 
to d 2 = 0. The elements of C n are called n-chains. This chain complex 
is denoted by (C, d). 

Definition 2.2. An (abstract) simplicial complex K is a set of finite, 
ordered subsets of an ordered set K, such that 

• the ordering of the subsets is compatible with the ordering of 
K, and 

• if a e K then any nonempty subset of a is also an element of 
K. 

The elements of K with n + 1 elements are called n-simplices and 
denoted K n . 

Definition 2.3. Given a simplicial complex K, the chain complex on 
X, denoted (C*(K),d) is defined as follows. Let C n (K) be the free 
.R-module with basis K n . We define the differential on K n and extend 
it to C n {K) by linearity. For [v o, . . . , f„] G -ft'n define 

d[u , ...,«„] = ^(-l)*[w , . . . . . . ,u n ], 

i 

where -Oj denotes that the element Uj is omitted from the sequence. 

For n > 0, the standard n-simplex is the n-dimensional polytope in 
R n+1 , denoted A ra , whose vertices are given by the standard basis vec- 
tors eo,...,e n . It is just the convex hull of the standard basis vectors; 
that is 



Mi a,i > and = 1 



i=0 



(2.6) A n =<^ = ^ ai e 

I i=0 

There are inclusion maps 

(2.7) <$i : A" -> A n+1 

(called the i-th face inclusion) are given by Si(xo, . . . x n ) = (xo, . . . , Xi-±, 0, 
for < % < n + 1. 
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Definition 2.4. Let X be a topological space. For n > 0, let C n (X) 
be the free -R-module generated by the set of continuous maps {<fi : 
A" -> X}. For n < 0, let C n (X) = 0. For : A" -> X let 

n 

(2.8) d(0) = J](-l) i 0o < J i GC n _i(X). 

i=0 

Extend this by linearity to an R- module morphism d : C n (X) — > 
C n _i(X). One can check that d 2 = so this defines a differential 
and C*(X) = ({C„(X)}„ e z, d) is a chain complex, called the singular 
chain complex. 

Definition 2.5. Given a chain complex (C,d), let be the submod- 
ule given by {x G C k \ dx = 0} called the k-cycles, and let B k be the 
sub module given by {x G C k | 3y G Cfc+i such that rfy = x}, called 
the k-boundaries. Since d 2 = 0, rf(rfy) = and thus B k C The 
fc-i/i homology of (C,d), denoted Hk(C,d) is given by the i?-module 
Zk/Bk- The homologies {Hk(C, d)}kez form a chain complex with dif- 
ferential denoted H*{C, d) and called the homology of (C, d). If R is 
a principal ideal domain (for example, if R is a field) and Hk(C,d) is 
finitely generated, then Hk(C, d) is the direct sum of a free group and 
a finite number of finite cyclic groups. The k-th Betti number fik{C, d) 
is the rank of the free group. If R is a field, then /3fe(C, d) equals the 
dimension of the vector space H k {C,d). If X is a topological space 
then H*(X) denotes the homology of the singular chain complex on X. 

Definition 2.6. Two spaces X and Y are said to be homotopy equiv- 
alent (written X ks Y) if there are maps / : X — > F and g : F — > X 
such that g o f is homotopic to the identity map on X and / o g is 
homotopic to the identity map on F. 

Remark 2.7. If X w F then #*(X) = H*(Y). So if X is a contractible 
space (that is, a space which is homotopy equivalent to a point), then 
H (X) = R and H k (X) = for k > 1. 

3. FlLTRATIONS AND PERSISTENT HOMOLOGY 
From now on, we will assume that the ground ring is a field F. 

3.1. Persistent homology. In Definition 2.5 we showed how to cal- 
culate the homology of a chain complex. Given some additional infor- 
mation on the chain complex, we will calculate homology in a more 
sophisticated way. Namely, we will show how to calculate the persis- 
tent homology of a filtered chain complex. This will detect homology 
classes which persist through a range of values in the filtration. 
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Let R denote the totally ordered set of extended real numbers R = 
EU {—00,00}. Then an increasing ^-filtration on a chain complex 
(C, d) is a sequence of chain complexes {J>(C, d)} rg jg such that T r {C, d) 
is a subchain module of (C, d) and J- r (C, d) C J- r >(C, d) whenever r < 
r' G R. A chain complex together with a R-filtration is called a R- 
filtered chain complex. 

For a filtered chain complex, the inclusions !Fj{C,d) — > Fj + i(C, d) 
induce maps 

HkiFjiCd)) - H k (F j+l (C,d)). 

The image of this map is call the /-persistent fc-th homology of J-j(C, d). 

Let Z£ = Z k (Fi(C,d)) and let 5£ = B k (Fi{C,d)). Assume a G Zj. 
Then a represents a homology class [a] in H^F^C^d)). Furthermore 
since Z % k C Z£' for alH' > 2, a also represents a homology class in 
if*(Jv(C, d)), which we again denote [a]. One possibility is that [a] 7^ 
in H k {Ti{C, d)) but [a] = in H k {Ti'{C, d)) for some > i. 

Assume (C, d) is a chain complex with an R- filtration jF r ((C, 0?)) such 
that 

(3.1) |J .F r (C, d) = (C, d) and f] T r [C, d) = 0. 

Equivalently, J r 0O (C', d) = (C, d) and T-^C, d) = 0. 

Lemma 3.1. Let (C, d) &e a filtered chain complex satisfying (3.1). 
For any n-chain a G (C,d), there is some smallest r G R suc/i £/ia£ 
a T r i{C, d) for all r' < r and a G J- r n{C, d) for all r" > r. 

Proof. This follows from the definition of an R-filtration, the assump- 
tion (3.1), and the linear ordering of R. □ 

Lemma 3.2. For any n-cycle a G Z n , the set of all r G R such that 
7^ [a] G FL n {T r {C, d) is either empty or is an interval. 

Proof. Let a G Z n , and let r\ be the corresponding value given by 
Lemma 3.1. 

If there is some (3 G C n+ i such that df3 = a then again let r 2 be the 
corresponding value given by Lemma 3.1. Since (3 G Fj{C, d) implies 
that d(3 G Tj{C, d), it follows that r 2 >r\. Thus a represents a nonzero 
homology class in jF r (C, d) exactly when r is in the (possibly empty) 
interval beginning at r\ and ending at r 2 . This interval contains r\ if 
and only if a G J- ri {C, d), and it does not contain r 2 if and only if 
(3 G J- r2 (C, d). 

If a is not a A;-boundary then a represents a nonzero homology class 
in J- r {C, d) exactly when r is in the interval {x \ x > r±} or {x \ x > 
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ri}. beginning at r\. Again this interval contains r\ if and only if 

«ef ri (C,(i). □ 

Definition 3.3. For a G Z k define the persistence k-homology interval 
represented by a to be the interval given by Lemma 3.2. Denote it by 

la- 

Definition 3.4. Define a Betti-k barcode to be a set of intervals 1 
{■Ja}aescz k such that 

• J a is a subinterval of I a , and 

• for all r G R, {[a] \ a G S, r G J Q } is an F-basis for H k (T r (C, d)). 
We will sometimes use (5k to denote a Betti-fc barcode. 

The set of barcodes has a metric [CZCG04] defined as follows. 

Definition 3.5. Given an interval J, let £(J) denote its length. Given 
two intervals J and J', the symmetric difference, A (J, J'), between 
them is the one-dimensional measure of J U J' — J H J'. Given two 
barcodes {J a }aes and {J' a i} a 'es', a partial matching, M, between the 
two sets is a subset of S x S' where each a and a' appears at most 
once. Define 

/ 

J2 a(j q , j' a> ) + J2 *w +J2 e ( 

where the minimum is taken over all partial matchings, and is the 
projection of M to Si. This defines a quasi-metric (since its value may 
be infinite). If desired, it can be converted into a metric. 

3.2. Persistent homology from point cloud data. Let (Ai,p) be 
a manifold with a metric p. Let X = {x±,X2, ■ ■ ■ , x n } C M.. X is called 
point cloud data. One would like to be able to obtain information on A4 
from X. If X contains sufficiently many uniformly distributed points 
one may be able to construct a complex from X that in some sense 
reconstructs M.. 

One such construction is the following R-filtered simplicial complex 
called the Cech complex. Recall that we are working over a ground 
field F. Let C*(X) be the largest simplicial complex on the ordered 
vertex set X. That is C (X) = X and for k > 1, Ck(X) consists of the 
ordered subsets of X with k + 1 elements. Now filter this simplicial 



£>({ J a } ae s, {J' a/ }a'£S') = min 



1 In Section 3.3 we will see that using the Cech filtration, the Betti-0 barcode of 
manifolds will have uncountably many intervals, so we will define a more appropri- 
ate descriptor, the Betti-0 function. In Section 4 it will also be useful to convert 
finite Bctti barcodes to functions so that we can analyze limiting and asymptotic 
behavior. 
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complex (along R) as follows. Given r < 0, define jF^(C n (X)) = for 
all n. Let B r (x) denote the ball of radius r centered at x. For r > 
and k > 1, define jF^(Cfc(X)) to be the F- vector space whose basis is 
the /c-simplices [x io , . . . , x ik \ such that H^qB^x^) ^ 0. We remark 

that there are fast algorithms for computing J 7 ^ (Ck{X)) . 2 J 7 ^ (C*(X)) 
is called the r-Cech complex. It is the nerve of the collection of balls 
{B r (xi)}™ =l , and its geometric realization is homotopy equivalent to 
the union of these balls. 

A related construction is the Rips complex. For each r, the r- 
Rips complex, J r ^ i (C^(X)), is the largest simplicial complex containing 
jF i p(Ci(X)). That is, F^iC^X)) is the F- vector space whose basis is 
the set of A;-simplices [x io , . . . , x ik ] such that p(xi ,x ie ) < r for all pairs 
0<j,£<k. 

Using either of these filtered chain complexes, one obtains a filtered 
chain complex as follows. Let A*(C*(X)) be the chain complex on 
C*(X). Filter this over R by letting 

F r (A*(C*(X))) = A*(f r (C*(X))), where T r = T° or Tf. 

To simplify the notation, we write A k (X) := A k (C*(X)). We remark 
that these nitrations satisfy (3.1): 

|J F r (A m (X)) = A,(X) and f] F r (A.(X)) = 0. 

Let a be an n-chain. By Lemma 3.1 we know that there is some r G R 
such that a ^ !F r i{A n (X)) for all r' < r and a e JF r /;(A n (X)) for all 
r" > r. In fact, 

Lemma 3.6. Consider an n-chain, a = YmLi a i{ x io-> ■ ■ ■ i x i n )- F° r ^e 
Cech filtration let 

r = max min{rj | 3x such that B r .(x) 3 x io , . . . x in }, 

i=l...m 

and for the Rips filtration let 

r— max max p(x ii , x ik ) . 

i=l...m j=£k 

Then a J r r i(A n (X)) for all r' < r and a G J r r "(A n (X)) for all 
r" > r. 



The balls of radius r centered at the points {a;, . } have nonempty intersection 
if and only if there is a ball of radius r containing the points {x^ }. There are fast 
algorithms for the smallest enclosing ball problem[FGK03, Gac06]. 
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If a is an n-cycle then by Lemma 3.2 there is a (possibly empty) per- 
sistence n-homology interval corresponding to a. Applying Lemma 3.6 
to a and if there is some (5 G A k+1 (X) such that df3 = a, applying 
Lemma 3.6 to f3, we get the following. 

Lemma 3.7. Given an n-cycle a, the persistence n-homology interval 
associated to a is either empty or has the form [r 1 ,r 2 ) or [r 1? oo]. 

3.3. Persistent homology of densities. Let be a probability den- 
sity on a manifold M. for some $ G O. We will use f$ to define two 
increasing R-filtrations on C*(Ai), the singular chain complex on M. 
(see Definition 2.4). 

3.3.1. The Morse filtration. For r G R, the excursion sets 

(3.2) M< r = {x G M | U{x) < r}, 

(used in Morse theory [Mil63]) filter Ai over R. Hence they also provide 
an R-filtration of the singular chain complex C*(.M), 

F r M (C*(M)) = C.(M< r ), 

which we call the Morse filtration. We remark that for all k, 

H k (F™C*(M)) = H k (M< r ). 

3.3.2. The Cech filtration. There is a dual increasing filtration to the 
Morse filtration which uses superlevel sets instead of sublevel sets. We 
modify this filtration slightly so that it mirrors the filtration on the 
Cech complex defined in Section 3.2, and we will call it the Cech fil- 
tration. We do this since the nitrations on the Cech complex and the 
related Rips complex are the main nitrations used in computations of 
persistent homology. 

Notice that in the Cech complex filtration all of the points in X, even 
distant outliers, appear when r = 0. So the Cech filtration starts with 
all of the points of M and the discrete topology, and then progressively 
connects the regions with decreasing density. 

For r < and all k, define F?(C k (M)) = 0. For r > 0, let 
F?(C (M)) = C (M). Assume k > 1. Let 

Constfc = {0 : A fe — > M | is constant} C C k {M). 

For < s < oo, let 

(3.3) M> s = {m G M | U(m) > s} . 
For r > 0, let 

(3.4) F?{C k (M)) = Const fe + C fc (A4>i). 

— r 
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From this filtered chain complex we can calculate persistence /c-homology 
intervals and Betti-A; barcodes just as in Section 3.2. 

Lemma 3.8. For k > 1, 

— r 

Proof. By definition, Z k {T c J r C*(M)) = Const fc + Z k C*(M > i), and 
B k {F G r C*{M)) = Const fe + B k C^M>i). So 

— r 

H k (F c r C*(M) - Z^C^M^/B^C^M^)) = H k (M > ±). 

— r — r — r 

□ 

Let r > 0. Recall the notation of Section 3.1 : Z r k = Z k (F?(C*(M))) 
and Bl = B k (F?(C*(M)). To start, Z£ = ¥[M]. Then ^{C^M)) = 
F[{0 : A 1 — > M | is constant, or im0 C A^>i}]. 

— r 

For two points x, y G M, there is some map <fi : A 1 — > M. such that 
0(0) = x, 0(1) = y and im(0) C M. > ± 1 in which case d(f) — x — y, if 
and only if x and |/ are in the same path component of M. > ±. Thus 

— r 

H (F?(C*(M)))^W[M/ ~], 

where x ~ y if and only if a; and y are in the same path component of 
M>i. 

— r 

In the case where M>i is path-connected, H^T^ (C^{M))) = ¥[M/M > i]. 

— r — r 

In particular H Q {F${C*{M))) = F[A^/A^>oo]. Since f# is a probability 
density, M>oo has measure 0. Therefore almost all m G M. represent 
a distinct homology class in T§ (C (M)) and there are uncountably 
many 0-homology intervals. As a result the Betti-0 barcode is not a 
good descriptor. In this section, we will describe how the 0-homology 
intervals can be used to describe a Betti-0 function, in the case where 
the density f$ satisfies a continuity condition. 

More generally, as long as M. — Aiyi is uncountable and A4 > i has 

— r — r 

countably many path components, then almost all homology classes in 
i?o(^> (C*(A4))) have a unique representative. In this case we use this 
as justification to consider only those homology classes with a unique 
representative. 

Assume that for all r, M — M > ! is uncountable and M > i has 

— r — r 

countably many path components, and that the following continuity 

condition holds for all m G M,: 

(3.5) 

Ve > 0, 3 injective : [0, 1] -> M s.t. 0(0) = m and f(<f>(t)) > f(m)-e. 
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This condition holds if /# is continuous. 

Lemma 3.9. Each to G M is a unique representative for [to] for ex- 



actly those values of r G 



' Mm)) 



or r G 



0, 



U(m) 



Proof. Let m G M. Since dm = 0, m G Zq for r > 0. Let [m] G 
H^J 7 ^ (C*(M.))) denote the homology class represented by m. By 



definition m G -M>i if and only if r > 



i 



M m ) ' 



Thus m is the unique 



representative for [to] for r < ■ By assumption, for any e > there 
is a injective map : [0, 1] — > .M such that 0(0) = to and f#((f>(t)) > 
f#(m)-e. Then G ^?(Ci(jW)) where r = M ^_ e - This implies that 
for any e > there is a non-constant continuous map : A 1 — > .M 
with 0(0) = to such that G J F^i (Ci(.M)). Hence to is not a 

unique representative for [to] for r > fa j m ^ . Therefore to is a unique 

representative for [to] for either r G 0, 



Mm) ' 
i 

M m ) 



or r G 



0. 



□ 



Before we formally define the Betti-0 function, we give the following 



intuitive picture. We draw each of our intervals 
vertically starting at r = and ending at r = 



0. 



or 



' Mm)) 



U(m)_ 

f# (to) . Furthermore 
we order the intervals from left to right according to their length. In 
fact we draw all of the intervals between x = and x — 1, where the 
x-axis is scaled according to the probability distribution fydis. The 
increasing curve traced by the tips of the intervals will be called the 
Betti-0 function. 

Definition 3.10. Formally, define the Betti-0 function (3q : (0, 1] x 
-> [0, oo] as follows. 3 For r G [0, oo], let 



(3.6) 



fddv. 



Since f$ is a probability density, g,g is an increasing function g$ : 
[0, oo] — > [0, 1] for each fixed i? G ©. Also recall that Ai>oo has mea- 
sure and by definition A4> = A4. So ^(0) = and g$(oo) = 1. For 
< x < 1, let 



(3.7) 



Po(x, $) = inf r 

9#(r)>x 



3 While our definition of /3 below (3.7) is valid for x — 0, we get /?o(0,i?) = 0. 
This does not provide any information, and is furthermore inappropriate in cases 
such as the von Mises distribution with k = (see Section 5.1 below) where (3o(x, i?) 
is constant and nonzero for x > 0. 



A STATISTICAL APPROACH TO PERSISTENT HOMOLOGY 



13 



If g$ is continuous and strictly increasing, 4 then 
(3.8) /3 (x,#)=gj\x) , 

for $ G 0. That is, /3o(x,"&) is the unique value of r such that 
Jm>i = x - 

— r 

3.3.3. Alexander duality. The Morse and Cech filtration on S 1 ^ 1 are 
related by Alexander duality. Let / be a density on S' p_1 . Assume 
that r G im(/) and that r < sup(/). Then Sy<* is a proper, nonempty 

subset of S^ -1 . Assume that S^<* is compact and a neighborhood 
retract. 

Theorem 3.11 (Alexander duality for the Morse and Cech nitrations 
on Let H denote reduced homology, let ¥ be afield, and let s — K 

Hi(S p f l\;¥) = fl*- 2 -*(Sjg;F) = H^S^-F). 

4. Expected barcodes of PCD 

4.1. Betti barcodes of uniform samples on S 1 . Let / be the uni- 
form density on S 1 . Let X = {X 1: . . . X n } C S 1 be a sample drawn 
according to /. X is called the point cloud data. In this section we 
consider the Betti barcodes obtained for the persistent homology of 
J r ^ i (A Jf (X)) the Rips complex on X (see Section 3.2). The metric we 
use on S* 1 is ^- times the shortest arc length between two points (we 
have normalized so that the total length of S 1 is one). 

Before we continue, we introduce some notation. Choose a such that 
X 1 = e icos( - a \ For k — 2,...n choose U k G [0, 1] such that 

We remark that each U k is uniformly distributed on [0, 1]. Now reorder 
the {U k } to obtain the order statistic 5 : 

< U n:1 <U n .. 2 < ...< Un:n-1 < 1- 

Let U n:0 = and U n:n = 1. Reorder the {X k } as {X n . k } to correspond 
with the {U n -. k }. Then for 1 < k < n define 

s k = U n ± — U n:k -i. 

The set S = {S±, . . . S n } is called the set of spacings [Pyk65]. We 
remark that if U k = U n: j with 1 < j < n — 1 and take the usual 
orientation of S 1 , then the distances from X k to its nearest backward 
neighbor and nearest forward neighbor are Sj and Sj + ±, respectively. 



4 In this case we can define (3q(x, i?) for x e [0, 1]. 

5 Equality among any of the terms occurs with probability zero. 
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Also the distance from X\ to its neighbors is S n and S\. It is well 
known (for example, [Dev81]) that 

Lemma 4.1. (Si, . . . , S n ) is uniformly distributed on the standard (n — 
1) -simplex {(xi, .. .x n )\xi > 0,Y17=i x i = -*-}■ It follows that 

r[bi > ai, ■ ■ ■ , b n > a n \ — < 

I (J otherwise. 

and 
(4.1) 

(Whitworth, 1897) P(S n:n > x) = (-l) k+1 (l-kx) n ^ ( n \ , > 0. 

k>i ^ ' 

kx<l 

Finally, order the spacings to obtain 

< 5,1:1 < S n -2 < . . . < S n:n -i < 1. 

Now we are ready to calculate the homology in degree 0. Recall 
that P (J-'J i (A^(X))) equals the dimension of H (J I '^ i (A^(X)), which 
equals the number of path components of jF 7 R (A ;ft (X)). Recall that 
JF J R (A (X)) is the empty set for r < and is the set X for r > 0. 
So at r = 0, there are (almost surely) exactly n distinct homology 
classes in H (J-'^ i (A^(X))) . Each homology class [XJ will no longer 
have a distinct representative when the distance from X^ to one of 
its neighbors is equal to r. That is each time r passes one of the 
Sk the dimension of H^(J r ^(A >t (X))) decreases by one. Therefore for 
fc = 0,...n-2, 

re[S n .. k ,S n:k+1 ) =}► p (F?(A,(X)))=n-k. 

When r > 5 n:n _i, F*(A*(X)) is path connected so /3 (^(A,(X))) = 
1. Translating this, we see that the Betti-0 barcode is the collection of 
homology intervals 

[0, S n ±) for k — 1, ... n — 1 and [0, oo]. 

Finally, let us consider the homology in degree 1. Let 

ot = (X n: i, X n: 2) + . . . + (X n:n -i, X n:n ) + (X n:n , X n -\). 

This is a 1-cycle in A*(X). 

Lemma 4.2. If S n:n < \ then the Betti-1 barcode is the single (possibly 
empty) persistence homology interval 

I a = [Sn-.n, R), where R E [|, \), 
otherwise it is empty. 
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Remark 4.3. If the large spacing S n:n is greater than or equal than 
| then all of the points Xi,...X n are concentrated on a semicircle, 
and JF J R (A^(X)) does not contain any non-trivial 1-cycles. By (4.1), 

P\Sn:n ^2-1 2 n ~ l ' 

Proof. Assume that S n:n < \. If r > S n:n , then a G ^(A^X)). 
We claim that by using the definition of the Rips filtration and the 
geometry of S 1 , a becomes a boundary at some R G [|, |]. Since half 
the perimeter of S 1 is \, when r > \, (X u Xj) G ^(A^X)) for all 
Xi,Xj G X. Thus when r> \ then JF*(A*(X)) = A* (A") which is the 
full (n — l)-simplex on the vertices X±, . . . X n . In particular if r > |, 
then a; is a boundary. 

Since S n:n < |, the geometric realization of a is a n-gon containing 
the center of S 1 . Thus if there is some (5 = /^(Xj, Xj, Xk) G 
J zr / ? (A 2 (X)) such that d/3 = a then for some (X h Xj, X k ) G F?(A 2 (X)) 
the geometric realization of (Xj, Xj, X k ) contains the center of S 1 . The 
smallest r for which this can happen is |. So if r < | then a cannot 
be a boundary. 

Thus a becomes a boundary when r = R for some R G [|, |]. If 
Sn-.n > | it is possible that R = S n:n , and a is not a non-trivial bound- 
ary in any JF^A^X). □ 



Remark 4.4. If S n:n < | then the Betti-1 barcode is a single non-empty 
persistence homology interval. Using (4.1), P[S n:n > |] < n (|) 



n— 1 



4.2. Expected values of the Betti barcodes. Let U±, . . .U n ^i be 
a sample from the uniform distribution on [0,1]. Let < U n: i < 
U n -2 < . . . < U n:n -i < 1 be the corresponding order statistic. 6 Define 
U n:0 = and U n:n = 1. For k = 1, . . . n, let S k = U n ± — U n -.k-\- Recall 
(Lemma 4.1) that the set of spacings S = {Si, . . . S n } is uniformly 
distributed on the standard (n — l)-simplex. 

Let < S n:1 < . . . < S n:n < 1 be the order statistic for the spacings. 
Then one can show [SW86, 21.1.15] that 

Proposition 4.5. For 1 < i < n the expected value of the spacings is 
given by 



ES n -i 



lA l _ l ^ l 

n ' n + 1 — j n ^ j 

3=1 J j=n+\-i J 



6 We use n here to match the notation of Section 4.1 where {Ui, . . . , U n -i\ is 
derived from {X\, . . . , X n } e 5 1 . 



16 PETER BUBENIK AND PETER T. KIM 

So the expected Betti-0 barcode is the collection of intervals 



' lA l 

' n ^ n + 1 



+ 1-J 



U{[0,oo]}, 



?€{!,. ..,n-l} 



and the expected Betti-1 barcode is 



\V-t' J] 

I n n + 1 — j | 



To obtain the Betti-0 function from the Betti-0 barcode let 

n(3o(x, 0) = ES n: \(n-l) x -\- 

The Betti-0 function is a normalized version of this „/3o(x, 0) = c n n flo(x, 0) 
so that Jg 1 n (3o(x, 0)dx = 1. (In fact, c n = jz~j^ — > which for large val- 
ues of n is approximately equal to n.) Thus, 

\(n-l)x] n 

m*,q) = - E — -p- = - E 1 

j=l J j=n+l-r(n-l)xl J 

Proposition 4.6. For < x < 1, as n — > oo ; 

n /3 (x, 0) -> -ln(l - x). 



Proof. By the definition of c„, lim n _ 
from the observation that 



1. The result then follows 



i ri, Ai i ri, 

- + / -cb < > — < — + / -eke 
n J k x p- < j k J k x 



and the fact that 



lim In 



n^oo + 1 — \(n — l)x] 



ln(l — x). 



□ 



In Figure 1, we graph the expected Betti-0 functions y = ioPo(x, 0) 
and y = ioo/3o(x, 0) and the limiting function y = — ln(l — x). For 
comparison, we also graph y — 1, the limiting function one would 
obtain if the spacings became relatively equal in the limit. 
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FIGURE 1. Graphs of the expected Betti 0-function for 
n = 10, 100 and f(x) = - ln(l - x). 

5. Barcodes of certain parametric densities 

5.1. The von Mises distribution. Let M = S 1 = {e 10 | 9 £ [-7T, vr)} C 
M 2 . We will use this parametrization to identify 9 £ [— tt, 7r) with an 
element of S 1 . Consider the von Mises density on S l with respect to 
the uniform measure, 

where \i £ [— n, n), k £ [0, oo) and Iq{x) is the modified Bessel function 
of the first kind and order 0, where the general v— th order Bessel 
function of the first kind is 

and T(-) denotes the gamma function. 

Our homologies will be independent of fi, so assume that /i = and 
so in this case the parameter $ = k. 

We will filter the chain complex on S 1 using both the Cech and Morse 
nitrations. Recall that by (3.3) and (3.2), = {9 £ S 1 | f K {9) > ±} 

and S< r = {9 £ S 1 \ f K {9) < r}. Choose a r ^ K £ [— ir, n) such that 

fnipt-r, ft) T. 

Specifically, let av, K = cos _1 (^ ln(^y)). Our calculations of the persis- 
tent homology will follow from the following straightforward result. 
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Lemma 5.1. For < r < — ^—r, S\ 1 —6, and for r < min f K , 

— max f K > - T J J 

j — r 

1 ^ 1 



Sl r = 0. For <r<— -,- . 

— r max / K — mm } K ' 

— r t r ' 

For min f K <r < max / K; 

5< r = | «r,« < < 2vr - a r ,J. 
For r > S l >L = S 1 , and for r > max/ K , S^ r = S 1 . 

Since its analysis is simpler, we start with the Morse filtration on 
S 1 . By Lemma 5.1, S< r is empty if r < min/ K , it is contractible 
(see Remark 2.7) if min/ K < r < max/ K and it is equal to S 1 if r > 
max(/ K ) K . It follows that the Betti-0 barcode for the Morse filtration 
is the single interval 



[min/ K ,oo] 



Io{n)e 

the Betti-1 barcode is the single interval 

[max/ K , oo] 



, oo 



e oo 



/o(«) 



and all other Betti-A; barcodes are empty. 

Now consider the Cech filtration on S 1 . We will derive a formula for 
the Betti-0 function, (3 (x,k), and calculate the Betti-/c barcodes for 
k>0. 

If k = then f = 1. So for r < 1, 5^ = 0, and for r > 1, = S 1 . 

— r — r 

By definition (3.6), 

'0 if r < 1, 
1 if r > 1. 



9k{t) = 



So by definition (3.7), Po(x, 0) = 1. 

For k > 0, let min(/ K ) = ^e~ K and max(/«) = j^rf K - For r < 

S}, = 0, and for r > x = S 1 . For < r < 



max(/ K )> ""^ ^ - min(/ K )> ----- max(/ K ) 

rrrr , since f K is even and decreasing for 6 > 0, 

min(/ K )' J ° ' 

— r 

where a r , K G (0,7r) and f K {a ryK ) = \. 

Let x G [0, 1] and assume that (3q{x,k) = r. Since k > 0, g K (r) = 
Jci fn{9)d6 is continuous and strictly increasing. So, 
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Define a rtK G [0, 7r] by the condition that / K (a rA ) = \. So 
(5.2) , 1 

For V e [0,7r], let 



/«(ar,«)' 

W) = f W)de. 

Jo 



Then 

(5.3) x= / / K di/= / f K (6)d6 = 2F K (a r , K ). 

— r 

Since F K is strictly increasing, it is invertible. So a rjK = i^ 1 (f). Thus 

Since / K and F K are smooth, by the inverse function theorem, so is F" 1 . 
So 

/5 (x, K ) = (F- 1 )'(^). 

We remark that as k — > 0, /3 (x,k) — > 1 = /3o(x, 0). We can also 
describe the graph of r = /3q(x,k) parametrically by combining (5.2) 
and (5.3) (see Figure 2): 

(5.5) h K (t) = (2F K (t),-±-) ,te [0,tt]. 



/«(*) 

For > 1, recall that 

^f(C fc (5 1 )) = Const fe +C fc (4 i )- 

— r 

Also recall that for r < — S^i =0, for — 77^ < r < . 7 . , , S}j 

max(/ K )' >i v ' max(/ K ) — mm(/ K )' >i 

is the arc from — a r , re to av jK where / K (av A ) = £ and for r > min ^ j , 
51, = 5 1 . It follows that for fc > 1, 

F for = 1 and r > 



I otherwise. 
Therefore the Betti-1 barcode has the single interval 
(5-6) [— flry,oo] = [/ (/t)e K ,oo] 
and for /c > 1 the Betti-A; barcode is empty. 
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FIGURE 2. Graph of the Betti 0-function of the von 
Mises density for a range of concentration parameters 

5.2. The von Mises-Fisher distribution. Now consider M. = S^ -1 , 
p > 3 and the unimodal von Mises-Fisher density given by 

where k e [0, oo), fi G S' 3 ' -1 , and 

is the normalizing constant with respect to the uniform measure. This 
is also known as the Langevin distribution. Note that the minimum 
and maximum of / also do not depend on \x: min(/ K ) = c(k)c~ k and 
max(/ K ) = c{n)e K . In fact, by symmetry the homologies will not de- 
pend on /i. Hence once again take $ = k. 

Consider the Morse filtration (defined in Section 3.3.1) on «S ,p ~ 1 . If 
r < min(/ K ) then S<~ = <fi and if r > max(/ K ) then S<~ = S ,p_1 . For 
min(/ K )) < r < max(/«) 

S^ 1 = {x 6 < ar,«}, 

where a rjK = ^ In ^^yj £ [ — 1, 1], So S^ 1 is the closure of S p ~ x minus 
a right circular cone with vertex and centered at fi. In particular, 
Sg: 1 is contractible (see Remark 2.7) so H (^ (C^Sp- 1 ))) = F and 
for fc > 1, H k (^(C,{S p - 1 ))) = 0. 
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Thus the Betti-0 barcode is the single interval [mm(f K ), oo), the 
Betti-(jo — 1) barcode is the single interval [max(/ K ), oo) and all other 
barcodes are empty. 

Consider the Cech filtration (defined in Section 3.3.2) on S^ -1 . For 
I < r < i 

max(/ K ) — mm(/ K )' 

SJ" 1 = {xe S?- 1 | x*n > ai }. 

— r r 

So is the intersection of S^ -1 and a right circular cone with vertex 

— r 

and centered at a. In particular for — < r < . \ t r . S^ 1 is 

^ ^ max(/ K ) — min(/ K )' >i 

contractible, so for > 1, H^S 1 ^) = 0. 

— r 

Assume k = 0. Then / = c(0), and 

if r < 7M> 



P-l — ) Y c(0) : 



^ I 1 if '• > - 



Thus 



1 if ^>^)- 



ln(rc(re)) N 



Therefore Po(x,0) := inf flf( (r)>* r = ^j. 

Assume k > 0. Then for = 0, 
(5.8) 

x = fl,(r) = / / K = c{k)-^ / e KCOs9 sin^ 2 ^ , 

-/S p - 1 S P~1 JO 

— r 
P 

where = When k > 0, g K (r) is continuous and strictly 

increasing. Hence 

(5.9) /5o(x, K ) = ( 7k - 1 (x) 

for x e [0, 1] and k > 0. As we did for the von Mises distribution (5.5), 
we can describe the graph of r = /3q,(x, k) more explicitly using a para- 
metric equation: 

(„ ft -KCOSt\ 

c(k)^=2 / e KCOSf? sin p - 2 ^^,--— , ie[0,7r]. 

For > 1, by Lemma 3.8, 

otherwise. 
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Therefore for k > 1 the Betti-/c barcode has the single interval: 



(5.H) 



1 e" 
min(/ B ) ' °° _ ^)'°° 



for = p — 1 and is empty otherwise. 

5.3. The Watson distribution. Let M. = S p ~ l and consider the 
following bimodal distribution 

(5.12) f^ K (x) = d{n) exp{/<:r*/i) 2 }, 

where k > and x, /i e S^ -1 , called the Watson distribution. We 
remark that this density is rotationally symmetric, where /i is the axis 
of rotation. The minimum and maximum densities are given by 

mmf = d(K), max / = d(n)e K . 

The maximum is achieved at x — ±/i and the minimum is achieved at 
all x such that x l fi = 0. 

Using the Morse filtration we get the following Betti barcodes. For 
p = 2, we remark that for r < min/, S< r = 0. For r = min/, 5< r is 
two points. As r increases, these points become two arcs of increasing 
size, which connect when r = max/. So the Betti-0 barcode consists 
of the two homology intervals [min/, oo] and [min/, max/), and the 
Betti- 1 barcode has the single interval [max/, oo]. All other Betti 
barcodes are empty. 

For p > 2, we observe similar behavior. When r < min/, S^" 1 = <p. 
For r = min/, S^" 1 is equator which is homeomorphic to S p ~ 2 . As 
r increases, the equator expands until it reaches the poles when r = 
max/. So the Betti-0, Betti-(p — 2) and Betti-(p — 1) barcodes each 
consist of a single homology interval: [min/, oo], [min/, max/), and 
[max/, oo], respectively. All other Betti barcodes are empty. 

Using the Cech filtration, S^ 1 is either empty, or consists of two 

— r 

contractible components, or is all of S^ 1 . So the Betti-(p— 1) barcode 
is the single homology interval [ mi ^ , oo] and the Betti-A; barcodes for 
all other k > 1 are empty. The Betti-0 function is given by Po(x, k) = 
g^(x), where 

9n(r) = [ f K = 2^ r ir) d{K)e^ os2 ^ sm^ 2 (e)d9, 
Js"- 1 s p-i Jo 



>4 



with ct K (r) = cos 1 (y\J —\ ln(rf(/t)r) j and s p _i = f^y- As with the 

von Mises (5.5) and von Mises-Fisher distributions (5.10), the Betti-0 
function can also be described parametrically. 
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5.4. The Bingham distribution. Again let M. = S p 1 with the 
probability density 

f K (x) = d{K) exp{^Kx) 

where x G S' p_1 C M 3 and K is a symmetric p x p matrix. We remark 
that fx(x) = d(K) expjtr Kxx 1 }. Also, by a change of coordinates we 
can write K = diag(&4, . . . k p ), where k p > . . . > k± are the eigenvalues 
of K. Let Vi be the eigenvector associated to k{. 

Assume that k p > . . . > k\ > 0. Then the minimum and maximum 
values of fx are given by 

min/x = d(K)e kl , max f K = d(K)e kp , 

and are attained at ±v i and ±v p . 

The Betti-/c barcodes (for k > 1) when p = 2 are the same as 
for the Watson distribution. When p > 3, the Bingham distribution 
differs significantly from the Watson distribution. For example, the 
minimum of the function is attained at only rkv\ which is certainly not 
homeomorphic to S p ~ 2 . 

Consider the Morse filtration. We can calculate the Betti-A; barcodes 
inductively. If we consider v p to be the north pole, then there is a 
homotopy from S^" 1 — {v p , —v p } to S p ~ 2 which collapses the sphere 
with missing its poles to the equator. When r < k p , by the symmetry 
of f K this homotopy also gives a homotopy from S^" 1 to S 1 ^ 2 where 
the filtration on S p ~ 2 is the Morse filtration associated to the Bingham 
distribution with K = diag(/ci, . . . 

As a result, the Betti-0 barcode is given by the two homology in- 
tervals [d(K)e kl ,oo] and [d(K)e k \ d(K)e k2 ). For 1 < k < p - 2, the 
Betti-i barcode is given by the interval [d(K)e ki+1 , d(K)e ki+2 ). Finally, 
the Betti-(jo — 1) barcode is given by the interval [d(K)e kp , oo]. 

We remark that this barcode corresponds the cellular construction 
of that repeatedly attaches northern and southern hemispheres of 
increasing dimension. 

For the Cech filtration we can use the same argument starting with 
V\. The Betti-0 barcode is given by the two homology intervals [e~ fep , 

and ^p^y [e -fcp , e _/Cp_1 ). For 1 < i < p — 2, the Betti-i barcode is given 
by the single interval [e~ kp -% e^^- 1 - 1 ). The Betti-(p— 1) barcode 
is given by the single interval \e~ kl , oo] . 

We remark that the correspondence between the two sets of barcodes 
is a manifestation of Alexander duality. 

5.5. The matrix von Mises distribution and a Hopf fibration. 

The Lie group of rotations of M 3 , SO (3), can be given the matrix von 
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Mises density 

(5.13) f AtK (X) = c(«)exp{«trpT*A)} , 

where A e 50(3) and k > is a concentration parameter. We deter- 
mine the Morse and Cech nitrations of 5*0(3) via the Hopf fibration 
S 3 -> MP 3 . 

The special orthogonal group S0(3) is diffeomorphic to the real pro- 
jective space MP 3 . The map S 3 - MP 3 which identifies each point on 
the sphere with the one-dimensional subspace on which it lies is a Hopf 
fibration whose fiber is 5° = {—1,1}. Thus, S 3 is a double-cover of 
50(3) (and since S 3 is simply-connected, it is the universal cover). 

If we represent S 3 with the unit quaternions and RP 3 with 50(3), 
then the Hopf fibration above is represented by the Cayley-Klein map 
p : S 3 -> 50(3): 

/pi \ 

P2 
P3 
\P4 ) 

We can use this map to relate the matrix von Mises density (5.13) on 
50(3) to the Watson density (5.12) on S 3 by making the following 
observation. If P = p(jp) and Q = p(q), then 

tr(P'Q) = 4(p'g) 2 - 1. 

Then if p(a) = A, 



I + 2p l B + 2B 2 , where 5 




, l {^G^(3)|/,, K (x)=r} = {ieS 3 |/ a , 4K (x) = M, where k = 

C{K) 

It follows that 

p- 1 (50(3)< r ) = S 3 <kr and p- 1 (50(3) > i) = S 3 >kl , 

r — r 

where the filtration on S 3 is with respect to Watson density f a ,4 K - 
Recall (Section 5.3) that for — ^ < kr < 5 3 x consists of two 

v ' max / — mm j ' > -±- 

contractible components. The Hopf fibration S 3 - MP 3 and eqmva- 
lently the map p : S 3 — > 50(3) identify these two components. So 
50(3) >i is contractible. Therefore, for the Cech filtration the Betti-3 

— r 

barcode is the single homology interval [^j, oo) and all other Betti-A; 
barcodes for k > 1 are empty. The Betti-0 function is identical to the 
one for the Watson density on S 3 . 

For min / < kr < max /, 5| fcr is homotopy equivalent (via a projec- 
tion onto its equator) to S 2 . The Hopf fibration S 3 — > MP 3 restricted 
to the equator gives the Hopf fibration and double cover S 2 — 
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The homotopy equivalences S< kr ~ S 2 induces a homotopy equivalence 
SO(3)< r « MP 2 . Thus for the Morse filtration, the Betti-0 and Betti-3 
barcodes are the single homology intervals [min /, oo) and [max /, oo) 
and all Betti-A; barcodes for k > 3 are empty. However, since the fun- 
damental group and integral homology group of degree one of RP 2 are 
the cyclic group of order two, the Betti-1 and Betti-2 barcodes depend 
on the choice of the field of coefficients F. If F is a field of characteristic 
(e.g. the rationals) then both are empty. However if F is the field of 
characteristic two (Z/2Z), then both are the single homology interval 
[min /, max /). 

6. Statistical estimation of the Betti barcodes 

In this section we will calculate the expected persistent homology 
using statistics sampled from various densities. 

6.1. The von-Mises and von-Mises Fisher distributions. For 

point cloud data x±, . . . , x n on S' p_1 sampled from the von Mises-Fisher 
distribution (2.3): f^ K (x) = c{k) exp{/ta;'/i}, we will give the statisti- 
cal estimators for the (unknown) parameters. We will show that these 
can be used to obtain good estimates of the persistent homology of the 
underlying distribution. 

Letting x = ^ Y17=i Xi denote the sample mean, consider the decom- 
position 

X = \\x\\ ^p|| j . 

The statistical estimator for /i is while the statistical estimator 

for k is solved [MJOO, Section 10.3.1] by inverting A p (k) = \\x\\, where 

Ap(X) = 7~~jxy> an d Iv(X) is the modified Bessel function of the first 
kind and order v. Hence, 

(6.1) k = A-\m- 

A large sample asymptotic normality calculation for (6.1) is [MJOO, 
Section 10.3.1] 

(6.2) v^(«-«) ^(O,^)- 1 ), 

as n — > oo, where -w means convergence in distribution and N(0, a 2 ) 
stands for a normally distributed random variable with mean and 
variance a 2 > 0. Using this estimate of k we obtain estimates for the 
(3 K barcodes for the Morse and Cech nitrations. For the Morse filtration, 
we estimate the (3 barcode and (3 p -i barcode to be [c(k)e~ K , oo] and 
[c(k)e K , oo], respectively. For the Cech filtration, we estimate the f3 p -i 
barcode to be [4^y, oo]. 
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Recall that the space of barcodes has a metric T> (see Definition 3.5). 
Let (3f I (f) and (3f{f) denote the Betti-i barcode for the density / using 
the Morse and Cech nitrations. Then the expectations of the distance 
from the estimated persistent homology to the persistent homology of 
the underlying density can be bounded as follows. 

Theorem 6.1. For the von Mises-Fisher distribution on S' p ~ 1 and 
k G [/to, where < Kq < K\ < oo, 

E(V(^(U),^U.)))<C(K)n-^ 

as n — > oo for all i, and 

E(V(rf(f k ),tf(f K )))<C( K )n- 1 ' 2 
as n — > oo for all i > 1, for some constant C(k). 

Proof. Since the barcodes have a particularly simple form, we only need 
to know the barcode metric for the following case: 

P({[a,oo]},{[6,oo]}) = |a-6|. 
Using our previous calculations of the Betti barcodes, we have: 
m M (/*)X(/.)) = \c(k)e- k - c(K)e- K \ 
Vtf^if = |c(«)e* - c(«)e*| 

v(Pp-i(h),Pp-i(f K )) = |c(«)-V-c(«)-V|. 

We note that the normalizing constant can be re-expressed as 
c(k)- 1 2 ' 2; 



where B(-, •) is the beta function. Furthermore, 

C («) = -5 — — , - — -—2 

v 2 2 / (£ ie *(l-f 2 )^*) 

and 

4(«) = i - a p (^) 2 - . 

For < /to < Ki < oo and k G [/to, Ki], we observe < c(/t), |c'(/t)|, A' p (k) < 
oo, and by the mean value theorem, 

E\c{k)e k - c(/t)e K | = E\(c(k*) + c'(/t*))e K *(/t - k)| , 

where /t* is a value between /c and k. Consequently, 

E\c{k)e k - c(K)e K \ < C(k) {E\k - /t| 2 } 1/2 

< C^n- 1 ' 2 
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where the first inequality is by the Holder inequality and the second 
is by (6.2). 
Similarly, 



E\c{k)e~ k - c{K)e- R \ = E\(c'(k*) - c(re*))e" K * (re - k)\ , 



and 



E 



C(K) C(K) 



E 



c(k*)-c'(k*) 



C(K 



*\2 



e K *(re - re) 



□ 



Expressing the estimated /^-function is more challenging. For the 
case of the sphere S 2 , an exact expression can be obtained. One can 
calculate that c(re) = sin ^ K ) , and from (5.8), 

e K 1 
9k ^ = 2 sinh(re) " 2re7' 
from which we use (5.9) to obtain, 



(6.3) 



Po(x, re) 



e 2K -l 



2k[(1 -x)e 2K +x] 

for x E (0,1] and re > 0. Notice that /3 (x, re) — > 1 as re — > and 
Po(x,k) — > as re — > oo, for all x G (0,1). Furthermore, for (6.1), 
[MJ00, 9.3.9] 

(6.4) A 3 (re) = cothre- ± . 

We have the following: 

Theorem 6.2. For the von Mises-Fisher distribution on S 2 , and fixed 
re > 0, 

E\\f3 Q (x,k)-f3 (x,K)\\ oo <C(K)n- 1 , 

as n — > oo. 



Proof. By the mean value theorem, 



(6.5) 



9 



/3 (x, re) - /3 (x, re) = q^Po(x, «)(« - «) , 



where re is between re and re. One can calculate that 



d_ 

<9re 



Po(x,k) 



1 - x)e 4K + (1 + 2re - 2x)e 2K + x 



2re 2 [(1 -x)e 2K + x] 2 

Recall that the domain of /3 (x, re) is (0, 1]. For x G (0, 1], |^/3 (x, re 
is bounded: for instance, 



(6.6) 



d_ 

<9re 



Po(x, re) 



< 



e 4K + (1 + 2re)e 2K + 1 



2re 2 



Combining (6.5), (6.6), (6.2) and (6.4) produces the desired result. □ 
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6.2. The Watson distribution. Recall that the Watson distribution 
on S' p ~ 1 is given by 

(6.7) fn,K,{x) = d(n) exp{n(x t fx) 2 } , where fj, E S ,p_1 and n > 0. 

Let us parametrize \x using the spherical angles: fi = /i(0), where = 
(0i, . . . , p -i)*. Let Xi, . . . X n be a random sample from the Watson 
distribution. 

If we take the sample to be fixed and the underlying parameters to 
be unknown, then the log-likelihood function of (6.7) is given by: 

n 

£(0, n)=n log + k ]T(X>(0)) 2 . 

3=1 

The maximum likelihood estimation of fi and k comes from the esti- 
mating equation: 



(6.8) 



v*, M, k) = o, 



where V<f, jK denotes the gradient. Let and k be the solutions to (6.8), 
which are the maximum likelihood estimators. Then the standard the- 
ory of maximum likelihood estimators [CH74, pp. 294-296] shows that 
the large sample asymptotics satisfy: 



(6.9) 



> d N p %I{4>,K)- 1 ) 



as n — > oo, where "— >d" means convergence in distribution, 7(0, k) is 
the Fisher information matrix 7 and N p stands for the p-dimensional 
normal distribution with given mean and covariance. It turns out that 
in the case of the Watson distribution, 



7(0, «) 











Consequently, from (6.9), we have that 



v^(£ - k) -> d N, 0, 



8P_ 

dn 2 



\ogd{n) 



as n 



oo. 



The Fisher information matrix is defined to be /(</>, k) = — -EV| K £(<f>, k), where 
V| K is the p x p Hessian matrix. 
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